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Abstract 

DPLL and resolution are two popular methods for solving the problem 
of propositional satisfiability. Rather than algorithms, they are families 
of algorithms, as their behavior depend on some choices they face dur- 
ing execution: DPLL depends on the choice of the literal to branch on; 
resolution depends on the choice of the pair of clauses to resolve at each 
step. The complexity of making the optimal choice is analyzed in this 
paper. Extending previous results, we prove that choosing the optimal 
Uteral to branch on in DPLL is Aj [log n]-hard, and becomes NP^^-hard 
if branching is only allowed on a subset of variables. Optimal choice in 
regular resolution is both NP-hard and coNP-hard. The problem of deter- 
mining the size of the optimal proofs is also analyzed: it is coNP-hard for 
DPLL, and Af [log n]-hard if a conjecture we make is true. This problem 
is coNP-hard for regular resolution. 

1 Introduction 

Several algorithms for solving the problem of propositional satisfiability exist. 
Among the fastest complete ones are DPLL ^] and resolution [3^ . Both 
of them depend on a specific choice to make during execution. DPLL is a form 
of backtracking, and therefore depends on how the branching variable is chosen. 
Resolution runs by iteratively combining (resolving) two clauses to obtain a 
consequence of them, until contradiction is reached or any other clause that can 
be generated is subsumed by one already generated. The choice of the variable 
to branch on and the choice of the clauses to combine (resolve) are crucial to 
efficiency. Formally, both DPLL and resolution are families of algorithms: each 
algorithm corresponds to a specific way for making the choices, and can be very 
different to the other ones of the same family as for its efficiency. Making the 
right choice is therefore very important for ensuring efficiency. In this paper, 
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we show that this problem is A2[logn]-hard for DPLL and backtracking, it 
becomes NP^^-hard for restricted-branching DPLL (the variant of DPLL in 
which branching is only allowed on a subset of the variables) , and is coNP-hard 
for regular resolution. 

A related problem is that of checking the size of the optimal proofs. Indeed, 
while satisfiability of propositional formulae can be proved with a very short 
"certificate" (the satisfying assignment), unsatisfiability (probably) requires ex- 
ponential proofs, in general. Checking the size of the optimal proofs of an 
unsatisfiablc formula is important for at least two reasons: the unsatisfiability 
proof may be necessary (for example, it is the input of another program), and its 
size being too large makes it practically useless; moreover, if the size of optimal 
proofs can be checked efficiently, we may decide to use incomplete methods to 
solve the satisfiability problem if the size of the optimal proofs is too large (to 
be more precise, if we can decide whether the formula is either satisfiable or has 
a short proof efficiently, then we can check the size of the proof to choose the 
algorithm.) We prove that the problem of proof size is coNP-hard for DPLL 
and backtracking (Aj [log n] -hard if a conjecture we make is true), NP^^-hard 
for restricted-branching DPLL, and coNP-hard for regular resolution. 

While the problem of making the right choice is the one that has been more 
studied in practice |^ EH] , it is the proofs size one that has been more inves- 
tigated from the point of view of computational complexity, perhaps because 
it is also related to the question of relative efficiency of proof methods. The 
proof size problem is known to be NP-complete |20[ I19L . Membership to NP 
only holds if the maximal size of proofs is represented in unary notation, while 
the results presented in this paper hold for the binary notation. The meaning 
of the difference between the binary or unary representation is discussed in the 
Conclusions. 

A problem that is related to that of the optimal choice is that of automatiz- 
ability, which has recently received attention Roughly speaking, a com- 

plete satisfiability algorithm is automatizable if its running time is polynomial 
in the size of the optimal proofs (and, therefore, it generates an almost-optimal 
proof.) Some recent results have shown that, in spite of some partially positive 
and unconditioned results resolution is not automatizable in general [2]- 

The paper is organized as follows: in Sectionl^J we give the needed definitions 
and some preliminary results; in Section|31we show the complexity of making the 
optimal choice in DPLL and backtracking; in Section^we analyze the restricted- 
branching version of DPLL; in Section [S] we consider the complexity of making 
the optimal choice in resolution. Discussion of the results and comparison with 
related work is given in Sectional 

2 Preliminaries 

In this paper we analyze solvers for propositional satisfiability. We assume that 
formulae are in CNF, i.e., they are sets of clauses, each clause being a disjunction 
of literals. For example, {xi V X2, ~^X3} is the set composed by the two clauses 
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xi V X2 and -^x^. We use the following notation: l\/F = {l\/j\^GF}, where 
is a formula and I is a literal. 

A prepositional interpretation is a mapping from the set of variables to the 
set {true, false}. We denote an interpretation by the set of literals containing 
X or -^x depending on whether x is assigned to true or false. This notation is 
also used to represent partial interpretations: if the set neither contains x nor 
-ix, then the variable x is unassigned. We denote the set of models (satisfying 
assignments) of a formula F by Mod{F) . We denote the cardinality of a set S 
by 15*1; therefore, \Mod{F)\ is the number of models of F. 

If f is a formula and / is a partial interpretation, F\I denotes the formula 
obtained by replacing each variable that is evaluated by / with its value in F 
and then simplifying the formula. The resulting formula only contains variables 
that / leaves unassigned. 

Proofs of unsatisfiability as built by the DPLL and backtracking algorithms 
are binary trees whose nodes are variables. We use the recursive definition of 
binary trees: a tree is either empty, or is a triple composed of a node and two 
trees. Trees will be represented either graphically or in parenthetic notation. In 
the parenthetic notation, () denotes the empty tree, and {x Ti T2) denotes the 
nonempty tree whose label of the root is x and whose left and right subtrees 
are Ti and T2, respectively. A leaf is a tree composed of a node and two empty 
subtrees, e.g., {x () ()). The size of a tree is the number of nodes it contains. 
In some points, we write "the empty subtrees of T" to indicate any empty tree 
that is contained in T or in any of its subtrees. By an inductive argument, 
the number of empty subtrees of a tree is equal to the number of its nodes 
plus one. The sentence "replace every empty subtree of Ti with T2" has the 
obvious meaning. The tree that is denoted by {x Ti T2) in parenthetic notation 
is graphically represented as in Figure ^ 
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Figure 1: Graphical representation of the tree {x Ti T2). 

This graphical representation justifies the use of terms such as "above" , 
"below", etc., to refer to the relative position of the nodes in the tree. If A is a 
tree or a formula, we denote by Var{A) the set of variables it contains. 
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2.1 Backtracking and the DPLL Algorithm 

The DPLL algorithm is a backtracking algorithm working in the search space 
of the partial models, enhanced by three rules. The backtracking algorithm can 
be described as follows: choose a variable among the unassigned ones, and re- 
cursively execute the algorithm on the two subproblems that result from setting 
the value of the variable to false and true. The base case of recursion is when 
either all literals of a clause are falsified (the formula is unsatisfiable), or when 
every clause contains at least one true literal (the formula is satisfiable.) The 
whole formula is satisfiable if and only if either recursive call returns that the 
sub-formula is satisfiable. Unsatisfiability therefore leads to a tree of recursive 
calls, in which unsatisfiability is proved in each leaf. This tree is called the 
search tree of the formula. A formula can have several search trees, each one 
corresponding to a different way of choosing the variables to branch on. 

Definition 1 A backtracking search tree (BST) of a formula F is: 

1. the empty tree () if F contains an empty clause (a contradiction); 

2. a non-empty tree (x Ti T2) otherwise, where x G Var{F), and Ti and T2 
are BSTs of F\{^x} andF\{x}, respectively. 

DPLL ^1 enhances the backtracking procedure with three rules: unit 
propagation consists in setting the value of a variable whenever all other variables 
of a clause are false; the monotone literal rule sets the value of variables that 
appear with the same sign in the whole formula; clause suhsumption consists in 
removing clauses that are subsumed by other ones. Clause subsumption is not 
used in modern implementations of DPLL, and we therefore disregard it. The 
search trees of DPLL are similar to those of backtracking. 

Definition 2 Let D{F) denotes the formula obtained from F by applying the 
unit propagation and monotone literal rules. A DPLL search tree (DST) of a 
formula F is: 

1. the empty tree () if D{F) contains an empty clause (a contradiction); 

2. a non-empty tree {x Ti T2) otherwise, where x G Var{D{F)), and Ti and 
T2 are DSTs of D{F\{^x}) and D{F\{x}), respectively. 

An optimal (backtracking or DPLL) search tree for a formula is a minimal- 
size search tree of F (the size of a tree is the number of nodes it contains.) A 
variable is an optimal branching variable for F if it is the root of an optimal 
search tree of F. 

In general, the BSTs and the DSTs of a formula are not the same. For 
example, (a:i(x2()())(x2()())) and (xi()()) are a BST and a DST of {^xi V 
X2,xi V -^X2}, respectively, but not vice versa. Nevertheless, a correspondence 
between BSTs and DSTs can be established: for each formula F, we can build 
a new one G in such a way the DSTs of G can be converted into BSTs of F. 
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Lemma 1 f | 21l Lemma 1]) Let F be a formula over variables {xi, . . . ,Xn}, 
F' be obtained from F by replacing every positive literal Xi with Xi V yi and every 
negative literal -iXi with -iXi V -lyt, and G be defined as follows: 

G = {x, V ^x, yy, I \ <i<n}\JF' 

Every BST of F is a DST of G, and every DST of G can be transformed into 
a BST of F by replacing each node labeled with yi with a node labeled with Xi 
and swapping its two subtrees. 

This theorem shows that backtracking can be "simulated" by DPLL: we 
can reproduce the backtracking behavior (and, therefore, its search trees) using 
DPLL. This resuh is used to prove the hardness of some problems about DPLL 
from the corresponding ones about backtracking. For example, the optimal BST 
size of a formula F is equal to the size of the optimal DST of the corresponding 
formula G; since the translation from F to G can be done in polynomial time, 
the problem of finding the optimal DST size for DPLL is at least as hard as the 
corresponding problem for backtracking. 

We denote by s{F) the size of the optimal search tree of the set of clauses F 
if it is unsatisfiable, and oo otherwise. Whether we consider the backtracking 
or the DPLL search trees can be inferred from the context. 

2.2 Resolution 

Resolution is a proof method based on the following rule: if 7 V x and S V ^x are 
two clauses, then 7 V (5 is a consequence of them. This step of generating a new 
clause from two ones is called the resolution of the two clauses; the generated 
clause is called the resolvent. Satisfiability can be established using the fact 
that this rule is complete if a set of clauses is unsatisfiable, then the empty 
clause (the clause with no literals) can be generated by repeating the application 
of the resolution rule. Efficiency clearly depends on how we choose, at each step, 
the pair of clauses to resolve. 

The clauses generated to prove unsatisfiability can be arranged into a DAG, 
in which the parent of two clauses is their resolvent. The root of this DAG is 
the empty clause, and the leaves are the clauses of the original set. If no path 
from the root to a leaf contains two times the resolution of the same variable, 
the resolution is called regular. Regular resolution is the process of checking 
unsatisfiability using a regular resolution proof. 

2.3 Complexity of What? 

The results in this paper are about the complexity of making choices in the 
DPLL and resolution procedures. Namely, we consider the problem of making 
the first choice optimally. For DPLL and backtracking, the problem is defined 
as follows. 

Name: Optimal branching variable (OBV) 
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Instance: A formula F and a variable x; 

Question: Is x the root of an optimal search tree of F? 

This is a decision problem, that is, its solution is either "yes" or "no". 
The related problem of finding an optimal branching variable can be solved by 
checking optimality for all variables of the formula. 

A related problem is that of finding the size of the optimal search trees of a 
given formula. The formal definition is as follows. 

Name: Optimal tree size (OTS) 

Instance: A formula F and an integer k in binary notation; 
Question: Does F have a search tree of size bounded by fc? 

The variant of OTS where k is in unary notation has already been investi- 
gated and proved NP complete |2(J[ |S1 (the assumption that k is in unary 
is necessary to prove the membership to NP.) Assuming that k is in unary 
notation means that the complexity of the problem is not measured w.r.t. the 
size of the formula F, but rather w.r.t. the size of the proof we are looking 
for, which can be exponentially larger. We assume that k is in binary notation 
instead. The difference between the binary and unary notation is discussed in 
the Conclusions. 

About resolution, the problem we consider is whether the resolution of a 
pair of clauses is at the leaf level of an optimal regular resolution proof. This 
is again the problem of making the first choice optimally when using regular 
resolution. Formally, the decision problem wc analyze is the following one. 

Name: Optimal resolution pair (ORP) 

Instance: A formula F and two of its clauses 7 and 6; 

Question: Is there an optimal regular resolution proof of F that contains the 
resolution of the leaves 7 and 6? 

We ask whether two clauses are brother leaves of a regular resolution proof (a 
DAG), while for DPLL the question is about the root of a tree. This difference 
is due to the way these procedures build their proofs: DPLL starts from the 
root, resolution starts from the leaves. In both cases, the problem we consider 
is that of making the first choice optimally. 

The problems we have presented in this section will be characterized in 
terms of complexity classes. Some of the classes we use are not well known as 
NP and coNP, so we briefly recall their definition. A machine that works with 
an oracle for the class C is a model of computation that can solve a problem in 
C in a unit of time. The class is the class of problems that can be solved 
by a machine that works in polynomial time with an oracle in NP. The class 
A2[logn] is similar, but the oracle can only be queried at most a logarithmic 
number of times. The class PP contains all problems that can be reduced to 
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that of deciding whether a prepositional formula is satisfied by at least half 
of the possible truth assignment of its variables. The class contains all 
problems that can be expressed as Li H L2, where Li and L2 are in NP and 
coNP, respectively. 

2.4 Combining Sets of Clauses 

In this section, we prove some general results about BSTs: first, we show a 
formula whose optimal BSTs have size within a given range; second, we show 
how to combine two sets of clauses having some control on the size of the 
optimal BSTs of the result. The first result is simply an adaptation of a result 
by Urquhart 30 to backtracking. 

Lemma 2 For any given square number m, one can find, in time polynomial 
in the value of m, a set of clauses H„i over m variables whose optimal BSTs 
have size between 2"^™ and 2'", where c is a constant (0 < c < I). 

Since the algorithm that finds Hm from m runs in time polynomial in m, 
the produced output Hm is necessarily of size polynomial in the value of m. 

The first method we use for combining two sets of clauses is the union. If 
two sets do not share variables, the optimal BSTs of their union are simple to 
determine. 

Lemma 3 (f2T], Lemma 3) If F and H are two sets of clauses not sharing 
any variables, and FUH is unsatisfiable, the optimal BSTs ofFUH are optimal 
BSTs of one of the unsatisfiable sets between F and H . 

In other words, if either F or H is satisfiable, the optimal BSTs oi F \J H 
are the optimal BSTs of the other formula. If both F and H are unsatisfiable, 
the optimal BSTs oi F \J H are the smallest among the BSTs of F and H. 

The second way for combining two sets of clauses is what we call "addi- 
tion" . This name has been chosen because the size of the optimal BSTs of the 
combination is the sum of the size of the optimal BSTs of the components. 

Definition 3 The sum of two sets of clauses F and H is: 
F = {F\/ x)U{HV ^x) 

where x is a new variable not contained in any of the two sets. When we do not 
care about the name of the new variable, we omit it and write F + H . 

We remark that, if cither F or H is satisfiable, their addition is satisfiable. 

Lemma 4 Let F and H be two sets of clauses built over two disjoint sets of 
variables, and let x be a variable not contained in them. If both F and H are 
unsatisfiable, x is an optimal backtracking branching literal for F +x H ■ 
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Proof. We prove this lemma by induction over the total number of variables 
of F and H. The base case is true: if neither F nor H contain any variable, 
then they can be unsatisfiable only if they both contain the empty clause (the 
contradiction). Their sum is therefore {.x,-ia;}. For the induction case, if the 
statement of the theorem holds for n > 0, and T is an optimal BST of F +x H, 
then either x is its root, or it is the root of both its subtrees (because of the 
induction hypothesis). In the second case, the tree can be reshaped to have x 
in the root. □ 

The optimal BSTs of F +^11 having x in the root have, as subtrees, optimal 
BSTs of F and H. As a result, if Ti and T2 are optimal BSTs of F and H, 
respectively, then (x Ti T2) is an optimal BST of F+xH. Moreover, the optimal 
BSTs oi F +x H have size equal to the sum of the size of the optimal search 
trees of F and H plus one. Another simple consequence of this theorem is that, 
if X is a variable not in F, then x is an optimal backtracking branching variable 
ofF+x± = {a;}U(-a; VF). 

We define the product of two sets of clauses as follows. 

Definition 4 The product of two sets of clauses F and H is: 

F ■ H = {-fW 5 \ & F and 5 & H} 

In the following, we will only consider the product of formulae not sharing 
variables. In this case, F ■ H is unsatisfiable if and only if both F and H are 
unsatisfiable. 

Lemma 5 If F and H do not share variables and are both unsatisfiable, the 
tree obtained by replacing every empty subtree of an optimal BST of F with an 
optimal BST of H is an optimal BST of F ■ H. 

Proof. The claim is proved by induction on the total number of variables of F 
and H. The base case is when the total number of variables of F and H is zero. 
Since both these formulae are unsatisfiable, they are both only composed of the 
empty clause. Their product is composed by the empty clause only as well, and 
is its only BST. 

Let us now assume that F is built over n variables while H is built over m 
variables. We prove the claim assuming that it holds for any pair of formulae 
whose total number of variables is n + m — 1. 

Let [x T[ T2) be an optimal BST of F ■ H. We first consider the case in 
which a; is a variable of F and then the case in which it is a variable of H. In 
both cases, we show that this tree can be modified, without changing its size, 
in such a way it satisfies the statement of the theorem. 

If a: is a variable of F, the two subtrees T[ and are optimal BSTs of 
{F\{^x}) ■ H and of (F|{a;}) • H, respectively, because x is not a variable of H. 
As a result, they have the same size of any other pair of optimal BSTs of these 
two formulae. In particular, since these two formulae have n + m — 1 variables, 
they have two BSTs Ti and T2 that are as specified in the statement of the 



8 



X 




Figure 2: Search tree oi F ■ H. 



theorem, i.e., Ti is a tree of -f^a;} where all empty subtrees are replaced with 
optimal BSTs of H, and the same for T2. The tree {x Ti T2) is therefore a 
tree that satisfies the condition in the statement of the theorem. Note that, if 
the variables of F and H are not disjoint, what results by this construction is 
an optimal BST of F whose empty subtrees are replaced by optimal BSTs of 
H\{^x} and of instead of optimal BSTs of H. 

Let us now assume that x is a variable of H . By definition of BSTs, T[ and 
T2 are optimal BSTs of {F ■ _ff)|{-ia;} and of [F ■ H)\{x}, respectively, which 
are the same as F ■ {H\{^x}) and F ■ {H\{x}), respectively. Since these two 
formulae contain ri + m — 1 variables, they have two BSTs Ti and T2 that are as 
specified in the statement of the theorem. Since Ti and T2 have the same size 
of T{ and T^, respectively, the tree T = {x Ti T2) is an optimal BST oiF ■ H. 
This tree is as in Figure |3 

The optimal BSTs of H\{^x} need not to be the same. However, they have 
all the same size. As a result, they can all be replaced by the same one T^. For 
the same reason, the optimal BSTs of can all be replaced by the same 

one T'^. Since x is not a variable of F, the trees Ti and T2 are both search trees 
of -F, and can therefore be replaced by Ti. The resulting tree can be rearranged 
by adding a number of copies of x below Ti, as shown in Figure |31 

This tree is exactly as specified by the statement of the theorem, and has 
been obtained from an optimal BST with transformations that do not modify 
the size. □ 
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Figure 3: The rearrangement of the search tree oi F ■ H. 



A consequence of this lemma is that the size of the backtracking optimal 
search trees oi F ■ H is equal to the product of the size of the optimal search 
trees of F and H , plus their sum. 

The following corollary summarizes the results obtained so far. 

Corollary 1 There exists a constant c, where < c < 1, such that for every 
positive integer m, there exists a set of clauses Hm such that 

s{H„,) e {2"",2"" + l,...,2™} 
// F and H are two sets of clauses not sharing variables, then: 

s{FLlH) = mm{s{F),s{H)) 

If F and H do not share variables and are both unsatisfiable, then: 

s{F+^H) = s{F)+s{H) + l 

s{F-H) = s{F)s{H) + s{F) + s{H) 



3 DPLL and Backtracking 

In this section, we first show that the results about how to combine formulae 
allow improving the current results on the complexity of choosing the branching 
literal in DPLL (NP-hardness and coNP-hardness [H].) We then turn to the 
problem of search tree size. 

Theorem 1 The OB V problem for backtracking is [log n] -/lard. 
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Proof. The reduction is from the problem PARITY (sat): given a sequence 
of formulae {Fi, . . . , Fr}, decide whether the first unsatisfiable formula of the 
sequence is of odd index; this problem is A2[logn]-hard [3T]. We make the 
following simplifying assumptions, which do not affect the complexity of this 
problem: 

1. r is even; 

2. each formula is built over its own alphabet of n variables; 

3. both Fr-i and Fr are unsatisfiable. 

We translate the sequence {Fi, . . . ,Fr} into the set of clauses F below. 
F = GUD 

G = ±+., 

{Fi U {H,n + H,n + 

(i^3 U [H^ + H^ + 

{Fr-3 U {Hm +Hm + 
(Fr-l)---) 

D = Hm + 

{F2 U [Hm + + 

(F4 U {Hm + Hrn + 



{Fr~2 U {Hm + Hm + 
{Fr)---) 

In this definition, m = 2n/c, where c is the constant of Lemma|21 We neglect 
the fact that m should be a square number, and assume that each Hm is built 
over a private set of variables. Formula F can be built in time polynomial in the 
size of the original instance of parity(sat), as unions and sums only increase 
size of a constant amount. 

Since F is the union of two sets of clauses G and D not sharing variables, 
its optimal BSTs are the minimal ones among those of G and those of D. By 
Lemma 01 x is an optimal branching variable of G. It is therefore an optimal 
variable of F if and only if s{G) < s{D). What is left to prove is that s{G) is 
less than or equal to s{D) if and only if the first unsatisfiable formula of the 
sequence has odd index. 

Let i be the index of the first unsatisfiable formula of the subsequence of 
{Fi, . . . , Fr} composed only of the formulae of odd index, and j the same for 
the even indexes. The values of s{G) and s{D) are: 

s{G) = l+'-^{2.s{Hm) + 2) + .s{F,) 

s{D) = .s{Hm) + l + ^-^{2s{Hm) + 2) + s{F,) 
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These equations can be proved by observing that D can be rewritten as 
Hm + (^2 U [Hm + D')), where D' is the formula corresponding to the se- 
quence {-Fa, . . . , -Ff}- A similar recursive definition can be given for G", where 
G = L+xG', i.e., G" = Fi[j{H,n + Hm + G") and G" is the formula correspond- 
ing to {-Fa, . . . , F'r}. The equations above can be verified against the recursive 
definitions of D and G. 

Let us now assume that i < j. We have that i < j — I, therefore: 

s(G) < l + -l^{2siHra) + 2) + siE,) 
= s{D) - s{Fj) + siF,) ~ siH^) 

The last step can be done because we have set m in such a way s{H„i) > s{Fi) 
for any formula s{Fi) of the sequence, li j < i we have j < « — 1; therefore: 

s{D) < s(iF„,) + l + i^(2s(Ff„) + 2) + s(F,) 

= s{H„,) + 1 + ^(2s(iF„) -f 2) - 2s{Hra) - 2 + s{Fj) 
= siG) - siF,) ~ siHra) - 2 + siF,) 
< 5(G) 

Since x is optimal if and only if s{G) < s{D), the claim is proved. □ 

By Lemma n for any formula F we can determine (in polynomial time) a 
formula G such that the BSTs of F corresponds to the DSTs of G. Replacing 
the formula F with the corresponding formula G in the proof above, we obtain 
a proof of [log n] -hardness of OBV for DPLL. 

Corollary 2 The problem OBV is A^[logn]-hard for DPLL. 

Let us now consider the OTS problem. This is the problem of deciding 
whether a formula has a DPLL proof of size bounded by a number k. This 
problem has been proved in NP by Buss ^ by showing a nondeterministic 
Turing machine that works in pseudo-polynomial time. The problem is therefore 
in NP only assuming that the size k of the required proof is expressed in unary 
notation. In fact, we prove that the problem is harder, but we need formulae 
whose optimal tree size is exponential. This is why the following coNP-hardness 
result does not contrast with the proof of membership to NP. 

Theorem 2 The problem OTS is coNP -hard for DPLL and backtracking. 

Proof. We prove that, given a formula G, its unsatisfiability is equivalent to 
the existence of a BST, of size bounded by k, for a formula F, where F and k 
can be computed from G in polynomial time. 

Namely, F = G U H,n, where m = (n + l)/c, and k — 2", where n is the 
number of variables of G. If G is satisfiable, then the optimal BSTs of F are 
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those of Hm and therefore s{F) = s{Hm) > S"^"* = 2"'+-'^, which is greater than 
k. If G is unsatisfiable, it has BSTs of size bounded by 2". These trees are 
smaller than those of Hm- As a result, the optimal trees of F are the optimal 
trees of G, whose size is bounded by k. 

This result also holds for DPLL thanks to Lemma ^ 

The problem being both coNP-hard and NP-hard JHI suggests it may be 
D^-hard. In this paper, we show a proof of D^'-hardness that however relies on 
the existence of formulae whose optimal BST size is known exactly and is an 
exponential in the size of the formula. 

Conjecture 1 (Exponential Exact Formulae) There exists a polynomial- 
time algorithm that takes an integer m in unary notation and gives a formula 
L„i whose optimal BSTs have size equal to 2™. 

The validity of this conjecture would allow building, in polynomial time, a 
formula whose optimal BST size is k even if k is not a power of two. This 
formula F can be built by incrementally as follows: 

1. start with F = _L; 

2. if s{F) — k, output F and stop; 

3. set F to F+Lm, where m is the maximal value such that s(F)+2™ + l < k; 

4. go to Point 2. 

In a logarithmic number of steps, we end up with a formula whose optimal 
BSTs are of size k. 

Theorem 3 // the Exponential Exact Formulae Conjecture is true, then the 
problem OTS is -hard for DPLL and backtracking. 

Proof. This theorem is proved by combining the formulae used in the proofs 
of NP-hardness and coNP-hardness in a single one. Iwama proved that the 
problem of checking whether an unsatisfiable formula has a tree-like resolution 
proof of bounded size is NP-hard. Since tree-like optimal resolution proofs are 
also optimal backtracking proofs and vice versa, this is also a proof of NP- 
hardness for backtracking. A minor technical difference is that the size of the 
proof is defined to be the total number of literals in Iwama's proof; however, his 
result still holds if the size of the proof is defined to be the number of nodes. 

Since the problem is NP-hard, there exist two polynomial-time functions a 
and P such that a formula F is satisfiable if and only if the unsatisfiable formula 
a{F) has search trees of size bounded by the integer f3{F). 

We use the problem sat/unsat: given two formulae, decide whether the first 
is satisfiable but the second is not. Two formulae F and E are in sat/unsat if 
and only if the formula D has search trees of size bounded by the number k. 

D = iia{F) ■ Lr) + E) U Era 
k = P{F) ■ 2'' + /3(F) + 2'' + 1 + 2" 
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The numbers r and m are defined as r = n + I and m = 2 log fc, where n is 
the number of variables of E. 

Let us first assume that E is satisfiable. Since {a{F) ■ Lr) + E is satisfiable 
in this case, the optimal search trees of D are exactly those of L„i. Therefore, 
s{D) = s{Lm) = 2" > k. 

Let us now assume that E is unsatisfiable. Since E contains n variables, we 
have s{E) < 2" - 1 < fc. If F is satisfiable, we have s{a{F)) < P{F). As a 
result, s((a(F) ■Lr) + E) < /3(F) • 2'' + /3(F) + 2'' + 1 + 2" - 1 < fc, which implies 
s{D) < k. 

If E is unsatisfiable and F is unsatisfiable as well, we have s(a(F)) > /3{F). 
As a result, s(a(F)) > /5(F) + 1, which implies s(a(F) • L^) > {/3{F) + 1) • 2'' + 
(^(F) + 1) + 2'' = /3(F) • 2'- + 2'- + /3(F) + 1 + 2'' >k. 

We have therefore proved that E is unsatisfiable and F is satisfiable if and 
only if s{D) < k. This proves that the OTS problem is D^-hard. By Lemma^ 
the same complexity result holds for DPLL. □ 

This hardness result can be used as an intermediate step for the proof of a 
more precise complexity characterization of the OTS problem. 

Theorem 4 // the Exponential Exact Formulae Assumption is true, the prob- 
lem OTS is A2 [log n] -/lard /or DPLL and backtracking. 

Proof. As a consequence of the last theorem, there exists a pair of polynomial- 
time functions a and (3 such that F is satisfiable and G is unsatisfiable if and 
only if s(a(F, G)) < f3{F,G). We use these two functions for showing that 
PARITy(sat) can be reduced to the problem of search tree size for backtracking. 

Given a set of formulae {Fi, . . . , Fr}, each built over its private set of vari- 
ables, the question of whether the first unsatisfiable formula has odd index has 
positive answer if either Fi is unsatisfiable, or Fi A F2 is satisfiable and F3 is 
unsatisfiable, or Fi A • ■ • A F4 is satisfiable and F5 is unsatisfiable, etc. This 
question can be expressed as an OTS problem as follows. 

D = {aiUue, Fi) + Gi) U {a{Fi A F2, F3) + G3) U ia{Fi A ■ ■ ■ A Fi, F5) + G5) U ■ ■ ■ 
k = max({/3(true,Fi),/3(Fi AF2,F3),/3(Fi A---AF4,F5),...})-f 1 

where Gi is the formula obtained by adding a number of formulae F„i in such 
a way s{G,) = fc - 1 - /3(Fi A • • • A F,_i, F,). 

Let us first assume that the index i of the first unsatisfiable formula of the 
sequence Fi, . . . , F^ is odd. We have: 

s(a(Fi A • • ■ A F,_i, F,)) < /3(Fi A • • • A F,_i, F,) 

Since s(G,) = fc-l-/3(FiA- • -AF^^i, F,), then s(a(Fi A- • -AF^^i, F,)-hG,) = 
s{a{Fi A ■ ■ ■ A Fi-i,Fi)) + s{Gi) + 1 < k. Since F) is a union that contains a 
term whose proof size is less than or equal to k, the proof size of D (being the 
minimal among its terms) is less than or equal to k. 
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Let us now instead assume that the first unsatisfiable formula of the sequence 
is of even index. In this case, for every odd index i, either Fi A • • • A -Fi-i is 
unsatisfiable, or Fi is satisfiable. As a result, we have: 

s(a(Fi A • • ■ A F^-i, F,)) > A • • • A F,_i, F,) 

As a result, s{a{Fi A • ■ • A Fi-i,Fi) + d) > k for every odd index i. Since 
all parts of D have optimal search tree size greater than fc, the proofs of D all 
have size greater than k. As an immediate consequence of Lemma ^ the same 
complexity result holds for DPLL. □ 



4 Restricted-Branching DPLL 

Satisfiability provers are often used for solving real-world problems that can 
be reduced to the problem of satisfiability. Formulae produced this way often 
contain variables whose value can be uniquely determined from the values of 
the other ones. If branching is not allowed on these variables, DPLL not only 
remains a complete satisfiability algorithm, but is even made more efficient in 
most cases [H [TOI El USl EZ| (but not always [TT].'! 

Backtracking is incomplete if we cannot branch over all variables. However, 
the algorithm obtained by adding unit propagation to backtracking (or, equiv- 
alently, deleting the monotone literal rule from DPLL) is complete as DPLL is. 
We call DPLL-Mono this algorithm. The search trees it generates are called 
DPLL-Mono search trees, and abbreviated DMST. The following theorem re- 
lates the search trees of DPLL and of DPLL-Mono. 

Lemma 6 Let F he a formula over variables {xi, . . . , Xn}, and let G he defined 
as follows: 

G = V -^yi,^Xi V U F 
Any DST of G can be transformed into a DMST of F by replacing each yi with 

Proof. The monotone literal rule cannot be used on G because all variables 
occur both positive and negative. We have to prove that the same happens for 
any partial assignment. Given an assignment, the value of Xi can be inferred 
by the monotone literal rule only if one clause between Xi V -^yi and -iXi V yi 
is satisfied. This can only happen when either Xi or yi are set to a value; if 
this is the case, unit propagation assigns a value to the other one. As a result, 
the monotone literal rule cannot be applied on G, making its DSTs exactly the 
same as its DMSTs, which are in turn equivalent to the DMSTs of F. □ 

The next result we prove is that a formula can be modified in such a way we 
can obtain an optimal search tree by branching first on a subset of its variables 
of our choice. 



15 



Definition 5 Let F = {71, . . . ,7m} be a formula over a set of variables X U 
Y U Z, such that the value of Z can be obtained from any truth evaluation of 
X UY by applying unit propagation in F. Let X = {xi, . . . ,Xn}- We define 
Cx{F) as follows: 

cx{F) = {7i V V ^6 I 7i e J^} U 

{-^Xi \/Vi\ XiGXjU {xi Vvi \ Xi€X}[J 
{-.wi V • • • V V a} U {-.wi V • • • V V 6} 

where a, b, and {vi, . . . , u„} are new variables not appearing in F. 

Once the values of X UY are determined, vi, . . . ,Vn are set to true by unit 
propagation because of Xi V Vi and -iXi V w^; the variables of a and b are set 
to true by unit propagation because of ^vi V • • • V -'Vn V a and -^Vi V • • • V 
^Vn V b. Simplifying cx{F) with these values we obtain F. At this point, unit 
propagation sets the values of Z by assumption. We can therefore conclude 
that F is satisfiable if and only if cx{F) is. Moreover, if F is unsatisfiable, then 
restricting branching on X UY still allows DPLL-Mono to prove that Cx{F) is 
unsatisfiable. 

What is interesting about cx{F) is that some optimal DMSTs of it are 
obtained by branching on the variables X before those of Y. 

Theorem 5 Let F be an unsatisfiable formula over variables X UY U Z , such 
that the value of Z can be obtained from that of X U Y by unit propagation. 
Restricting branching on the variables in XuY, there exists an optimal DMST 
of cx [F) made of a complete tree over X in which trees over Y replace the 
empty subtrees. 

Proof. By induction on the number of variables oi X U Y. If F contains 

no variable, the empty tree is an optimal DMST of it, and the empty tree 
satisfies the condition of the theorem. If F contains one variable, either it is 
a variable of X or it is a variable of Y. The second case is easy to deal with, 
as cx{F) = {yi V -la V ^6, -lyi V -la V ^6, a, 6}, and the empty tree is again an 
optimal DMST of this formula. If the only variable F contains is xi G X, then 
cx{F) = {xi V -la V -16, -1X1 V -la V -16, ^xi V vi,xi\/ vi,^vi V a, ^vi V b}. This 
formula cannot be proved unsatisfiable just by applying unit propagation. Since 
branching is allowed only on xi, the tree {xi () ()) is the only DMST of it. This 
tree satisfies the conditions of the theorem. 

Let us now prove the induction case. If the root of an optimal DMST of 
Cx{F) is Xi, then its left and right subtrees are DMSTs of cx{F)\{^Xi} and 
of cx{F)\{xi}, which are the same formulae as cx{F\{-'Xi}) and cx{F\{xi}), 
respectively. By the induction hypotheses, these subtrees obey the statement of 
the theorem, and the claim is proved. 

Let us now consider the case in which the root of an optimal DMST of cx (F) 
is a variable yi. If cx{F) does not contain any variable Xi, the statement of the 
theorem is true, as cx {F) is equal to F after the propagation of a and b. If there 
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are some variables Xi, setting the value of yi does not have any consequence 
on the other variables. We can therefore use the induction hypothesis: both 
subtrees satisfy the condition of the theorem, as cx{F)\{^yi} = cx{F\{^yi}) 
andcx{F)\{y,} = cx{F\{y,}). 



Vi 




Figure 4: An optimal DMST of F ■ H . 

The tree T is therefore as represented in Figure^] This tree can be modified, 
without changing neither its size nor the property of being a search tree, as 
follows: replace T2 with Ti. This is possible because both trees are complete, 
so they have exactly the same set of assignments at the leaves. Therefore, by 
suitably changing the position of the trees on Y (i.e., Tj^, T|, . . ., T™) we obtain 
another search tree, which has exactly the same size of the original one. 

Another step of the transformation is to replace {yi Ti Ti) with the tree 
obtained by replacing each empty subtree with (y^ () ()) in Ti. This tree has 
exactly the same size of the original one, and the same assignments in the leaves. 
As a result, by adding the subtrees TI and we still obtain a search tree, which 
is shown in Figure [S| 




nodes labeled yi 



Figure 5: The result of the transformation. 
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This tree satisfies the condition of the theorem. 



□ 



From the shape of the optimal DMST of cx{F), we can infer their size. 

Corollary 3 Let F be a formula over X L) Y U Z , where \X\ ~ n. Assuming 
branching is allowed only on X \JY , we have: 

s(cx(F)) = 2"-l+ 

X' m sl model over X 

The previous theorem also shows that the sum of formulae can be defined 
as F+^G = C{;^}((FVx)U(G'V-i.t)) for DPLL-Mono (this is useful, as it is not 
clear whether Lemma 0] holds for DPLL-Mono.) By Theorem [Sj indeed, there 
exists an optimal search tree of F +x G containing x in the root, and the two 
subtrees are optimal DMST of F and G, respectively. As a result, the size of 
the optimal DPLL-Mono search trees of F -f ^ G is s{F) + s{G) + 1. 

The property about the union of two formulae FUG still hold for backtrack- 
ing with unit propagation (the proof is like the one for backtracking). Formulae 
Hrn can be replaced with formulae whose optimal search trees have an exact 
exponential value. 

Corollary 4 The optimal DMST ofVn ~ cxiXLl{y, -^y}), where X — {xi, . . . , Xn}, 
have size 2" — 1. 

Formulae that have exact size can be built easily even if the size is not equal 
to 2" — 1 for some n: using the same construction reported after Conjecture ^ 
we can build a formula /„i that has optimal DMST size equal to m in polynomial 
time, for every m > 0. These formulae allow for reducing the OTS problem to 
the OBV problem. 

Theorem 6 For restricted-branching DPLL-Mono, the OTS problem can be 
polynomially reduced to the OBV problem. 

Proof. Given a formula G, we know that s(G) < fc if and only if a is the optimal 
branching variable of (_L G) U Ik+i- □ 

Another consequence of Theorem |S1 is the possibility of relating the search 
tree size of a formula with the number of models of another one. 

Corollary 5 Let G be a formula over X . Restricting branching over the vari- 
ables in X U {y}, where y is a new variable not in X , the size of the optimal 
DMSTs ofex{G) is 2"+^ - 1 + 2\Mod{G)\, where ex{G) is defined as follows: 

ex(G) = cx(GU{y,-y}) 

We prove that the problem of search tree size is hard for the class NP^^. 
First of all, we need a complete problem for this class. We use e-minsat: given a 
formula F over variables XiJY , decide whether there exists a truth assignment 
over X such that at most half of the models extending it satisfy F. The similar 
problem where "at most" is replaced by "at least" is called e-majsat, and is 
NP^^-complete |221- Proving that e-minsat is complete for the same class is an 
easy exercise. 
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Theorem 7 Checking whether the size of the optimal DMST of a formula is 
bounded by a number in binary notation is NP^^ -hard for restricted-branching 
DPLL-Mono. 

Proof. We reduce e-minsat to the problem of search tree size. Let F be a 
formula over X IJY, where \X\ = \Y\ = n. Given a truth evaluation X' over 
X, the number of models of F\X' are related to the formula cy{{F\X') U/i) by 
Corollary 13 

s{cY{{F\X')yJh)) = 2"-l+ s{{F\X'yjY')\Jh) 

Y' is a model of Y 

= 2'' -l + \Mod{F\X')\ 

The optimal tree size of cy{F\X' U Ii) linearly depends on the number of 
models of F\X'. The reduction is completed by the addition of a formula Ik, 
where k = 2"+^ + 2"-i. Indeed, the formula cy{{F\X') U h) U h has the 
following property: 

s(cf((F|X') U h) Ulk) < k if F\X' has at most half of the models 
s(cf((F|X') U h) Ulk) = k otherwise 

By combining Corollary |21 with the above inequalities, we obtain a way for 
summing up the size of the search trees of all formulae F\X'. The optimal 
search trees of cx{cy(F U /i) U Ik)) have indeed the following size: 

s{cxicYiF U h) U Ik))) = 2" - 1 + 2"fc if aU F\X' have more than half models 

< 2" - 1 + 2"fc otherwise 

This proves that cx(cy(F U /i) U /*;)) has search trees of size bounded by 
2" + 2"fc — 2 if and only if there exists X' such that F\X' has at most half of 
the models. □ 

Theorem El shows that the OTS problem can be polynomially reduced to the 
OBV problem. Moreover, Lemma |H| shows that any formula can be translated 
into another one whose DST are the DMST of the original one. This is therefore 
a reduction from the OTS problem for DPLL-Mono to the OTS problem for 
DPLL. 

Corollary 6 The problems OTS and OBV for restricted-branching DPLL are 
NP^^ -hard. 

NP^^ contains P^^ [21], which in turn contains the whole polynomial hi- 
erarchy. As a result, the above theorem shows that the problem of search tree 
size is hard for any class of the polynomial hierarchy. 



19 



5 Regular Resolution 



In this section, we consider the problems of the proof size and of the optimal 
choice for regular resolution. We proceed by first checking which results for 
backtracking and DPLL continue to hold for regular resolution, and then proving 
hardness results from them. Formulae having exponential optimal resolution 
proofs exist both for regular and general resolution [221 HSl OOl- The result 
s{F U H) = mm{s{F), s{H)) holds for resolution: since the clauses of F and 
H do not share variables, resolution can only be applied between two clauses 
of F or between two clauses of H. This implies that any optimal resolution 
proof either contains clauses of F only or of H only. It is not clear whether the 
properties of multiplication and sum hold for resolution. 

The problem of proof size is NP-hard because of a result by Iwama 'W (this 
result also holds if the size of a resolution proof is defined to be the number 
of generated clauses instead of the total number of literals.) Using the union 
of formulae, we can prove that the problem is coNP-hard as well: if Hm is a 
formula whose optimal proof size is greater than 2", then the formula G U Hm 
have proof size less than or equal to 2" if and only if G is unsatisfiable, where 
G is a formula over n variables. 

Theorem 8 Deciding whether there exists a regular resolution proof of a for- 
mula, of size bounded by a number, is coNP -hard. 

Let us now consider the problem of the optimal choice, i.e., whether two 
clauses are brother leaves of an optimal proof. In order to prove a hardness 
result, we need a way for building formulae for which an optimal choice is 
known. 

Lemma 7 Let F be an unsatisfiable formula such that F\{'y} is satisfiable, x 
a new variable not in F, and gx{F) the following formula: 

gx{F) = {x,-a;V7}un{7} 

All optimal regular resolution proofs of gx{F) contain exactly one resolution step 
involving x. Such a step can be pushed to the leaves of the proof. 

Proof. Since 7 is needed to make F unsatisfiable, the clauses x and ^x V 7 are 
both needed to make gx{F) unsatisfiable. Therefore, they are both leaves of any 
resolution proof of gx{F). We can also show that some optimal proofs of gx{F) 
actually contain the resolution of these two clauses. 

Since -ix V 7 is a leaf of all regular resolution proofs of gx [F) but the root 
of the proof does not contain literals then, in any path from -ix V 7 to the root, 
there is a resolution step that eliminates -^x. The only clause that can eliminate 
-ix is x. As a result, every path from -1XV7 to the root contains the resolution of 
a clause -ix V 5 with the clause x. Since the proof is a DAG, there may be more 
than one such path. However, since we assume we are using regular resolution, 
no path contains more than one resolution with x. Figure shows an example 
of such a proof. 
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Figure 6: A resolution proof of gx{F). 

We show that, ii W 5i, . . . , W Sm are the clauses that are resolved with 
X, then all these resolution steps can be replaced by the single resolution of x 
with -ix V 7. This is possible because, in the path from -ix V 7 to -ix V Si, the 
variable -ix is present in all clauses (because this is a regular resolution proof). 

The transformation is as follows: we first identify the three nodes ^x V 6i, 
X, and 6i for each i; we then remove all literals -^x from internal nodes of the 
DAG; we then replace the leaf ^a; V 7 with the resolution of -ix V 7 and x. This 
leads to a new regular resolution proof, made like the one in Figured 

If the number of clauses Si is greater than one, the proof is made smaller, 
thus contradicting the assumption of optimality. This proves that the optimal 
proofs only contain one resolution step involving x. In this case, the size of the 
proof is left unchanged by the transformation that pushes the resolution of x to 
the leaves of the DAG. □ 

This lemma tells how to modify a formula in such a way an initial resolution 
step is known, but it only holds when a clause 7 is known to be necessary to 
make the formula unsatisfiable. We now remove this assumption. 

Lemma 8 If F is an unsatisfiable formula not containing the variables x and 
y, all optimal regular resolution proof of fy{F) — {x, -^xW y}U {^y V (5 | (5 e F} 
contain exactly one resolution of x, which can be pushed to the leaf level. 

Proof. The unit clause y is needed to make {^y W 5 \ 5 E F} unsatisfiable, if F 
is unsatisfiable. The previous lemma therefore applies. □ 

The complexity of the optimal regular resolution pair is characterized as 
follows. 

Theorem 9 The ORP problem is both NP-hard and coNP-hard. 
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Figure 7: Pushing down the resolution of x and -^x V 7. 

Proof. Let F be a formula on n variables, and let be a formula whose 
optimal regular resolution proofs are larger than 2". We show that both the 
satisfiability and the unsatisfiability of F can be reduced to the ORP problem. 
By the previous lemma, if F is unsatisfiable then the resolution of x with V y 
is an optimal choice for fy{F). For the same reason, the resolution of w with 
-iw V 2: is optimal for ff{Hm). 

Consider the formula /^(i^) U fT{Hrn). Since f^{F) and fl"{H,n) do not 
share variables, every resolution proof of it either contains only clauses of (F) 
or only clauses of ff{Hm)'- otherwise, the proof would not form a connected 
DAG. Since Hm is unsatisfiable, f'^{Hm) is unsatisfiable as well. 

On the other hand, the satisfiability of fy(F) depends on that of F. If 
F is satisfiable, then fy{F) is satisfiable as well. As a result, the proofs of 
fy{F) U f'^{Hm) are exactly the proofs of the only unsatisfiable formula of the 
union, that is, f^{Hm). Resolving w and -'WV z is therefore an optimal choice, 
while resolving x and -ix V y is not. 

If F is unsatisfiable, so is fy(F). The optimal proofs of F arc at most 2" 
large. A proof for fy {F) can be obtained by adding y to all nodes of a proof of 
F, and then resolving its root with the result of the resolution of x with ^x V y. 
As a result, fy{F) has a regular resolution tree of size 2" + 2. 

Since Hm is unsatisfiable, so is ff{Hm)- Any regular resolution proof of 
this formula can be modified in such a way the resolution of w and -^w V 2; is at 
the leaf level. The rest of the proof is a proof of H„i with the addition of z to 
all clauses (otherwise, z would be resolved more than once, leading to a larger 
proof.) As a result, any proof of f^{Hm) has size greater than or equal than 
that of Hjn plus two. 

The smaller between the proofs of fy{F) and of f^{Hm) are the former ones. 
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We can therefore conclude that, if F is unsatisfiable, then x and -ix V y is an 
optimal choice for fy{F) U f^{Hm), while w and V z is not. Since we have 
already proved the converse if F is satisfiable, the claim is proved. □ 

6 Conclusions 

In this paper, we have enhanced two complexity results about the complexity 
of DPLL and resolution: namely, the complexity of choosing the best branching 
variables is not only NP-hard and coNP-hard [21], but is also A2[logn]-hard; 
the problem of proof size is not only NP-hard [^OlEinj but also coNP-hard, 
if the size bound is in binary notation. 

The problem of the search tree size can be also proved to be AjpognJ-hard 
by assuming the possibility of building, in polynomial time, a formula whose 
optimal search tree size is exactly known and exponential. While this seems 
likely, no formal proof of it is in the literature. Namely, we known how to build, 
in polynomial time, formulae of exponential optimal search tree size, but only a 
lower bound of this optimal size is known, not the exact value. The possibility of 
building these formulae is also related to the similarity of the problems of search 
tree size and that of optimal choice: if this is the case, indeed, the problems of 
optimal choice and optimal search tree size can be easily reduced to each other. 

Let us now compare with other work in the literature. The problem of search 
tree size has been already analyzed for various proof systems. For backtracking, 
this problem has been shown NP-complete |2()lll9l [T]. The membership into NP, 
however, only holds if the number k of the question "is there any proof of size 
bounded by fc?" is in unary notation. The intuitive meaning of using the unary 
notation is that the proof to search for should be small enough to be stored. 
The binary representation makes sense either when the proof is represented in 
some succinct form, or when we only want to evaluate the proof size (without 
finding it). Most SAT checkers developed in AI, for example, are not aimed at 
producing a proof of unsatisfiability, but only at producing a correct answer. 

A problem that is related to the complexity of choice and of tree size is that 
of automatizability of proof systems. A proof system is called automatizable 
if a proof can be produced in time that is polynomial in that of the optimal 
proofs (the generated proof can therefore only polynomially larger than the 
optimal ones.) The problems of optimal choice and automatizability, while 
somehow close to each other, are however different. Automatizability is about 
the time needed to generate the whole proof; the optimal choice problem is that 
of making, at each step, the optimal choice. The first question is "global" , as it 
involves the whole proof; the second one is "local" , as it is about a single step of 
the proof. Doing a single step may be hard, while the other steps of the proof 
are easy: if this is the case, automatizability may be feasible while the problem 
of the optimal choice remains hard. 

The relative importance of automatizability and optimal choice complexity 
depends on the expected application of the satisfiability algorithms. If a com- 
plete proof is required, the running time of a satisfiability checker has to be 
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measured w.r.t. this output. Therefore, automatizability is important. On the 
other hand, many appHcations |14[ 0S| require a proof only if the formula is 
satisfiable (i.e., they require a model if it exists.) Most algorithms used in this 
case are oblivious: each choice is made neglecting the previous ones. These 
algorithms face the choice of the branching variable. 

Finally, let us discuss the questions left open. The problems about DPLL 
are only known to be hard for classes at the second level of the polynomial 
hierarchy, while the only class they are known to belong to is PSPACE. The 
results on restricted branching are more precise, as the problem are hard for all 
classes of the polynomial hierarchy. Unrestricted branching may be as hard as 
restricted branching, but no proof of this claim has been found. 

A large gap between the hardness and the membership results is present in 
our results for regular resolution. The method used for proving that problems 
about DPLL are in PSPACE docs not work for resolution. The problem is that 
resolution proofs are DAGs, not trees. Therefore, iteratively guessing a choice 
and checking the total size does not work, as two nodes of the DAG may have 
the same child. This argument is a hint that neither the problem of the optimal 
choice nor that of the size are in PSPACE for resolution. 
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